Large Lexicons for Natural Language Processing: Utilising the Grammar Coding System of LDOCE

نویسنده

  • Branimir Boguraev
چکیده

This article focusses on the derivation of large lexicons for natural language processing. We describe the development of a dictionary support environment linking a restructured version of the Longman Dictionary of Contemporary English to natural language processing systems. The process of restructuring the information in the machine readable version of the dictionary is discussed. The Longman grammar code system is used to construct 'theory neutral' lexical entries. We demonstrate how such lexical entries can be put to practical use by linking up the system described here with the experimental PATR-II grammar development environment. Finally, we offer an evaluation of the utility of the grammar coding system for use by automatic natural language parsing systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acquisition of Large Scale Categorial Grammar Lexicons

A system is presented for inducing Categorial Grammar (CG) lexicons for natural language from either unannotated or minimally annotated corpora extracted from the Penn Treebank. A combination of symbolic and stochastic methods have been used to build a computationally e ective and psychologically plausible system, which learns linguistically useful lexicons. There are a variety of parameters in...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars

We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an...

متن کامل

Lexical Disambiguation using Simulated Annealing

The resolution of lexical ambiguity is important for most natural language processing tasks, and a range of computational techniques have been proposed for its solution. None of these has yet proven effective on a large scale. In this paper, we describe a method for lexical disambiguation of text using the definitions in a machine-readable dictionary together with the technique of simulated ann...

متن کامل

Are Ontologies Involved in Natural Language Processing?

For certain disable persons unable to communicate, we present a palliative aid which consist of a virtual pictographic keyboard associated to a text processing from a pictographic scripture. Words and the grammar are given as pictograms. The pictographic lexicon must be organized following the mental lexicon of the user to propose the pictograms of grammar in order to facilitate his task of wri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 13  شماره 

صفحات  -

تاریخ انتشار 1987